Comparison of Parallelisation Approaches, Languages, and Compilers for Unstructured Mesh Algorithms on GPUs

نویسندگان

Gábor Dániel Balogh

I. Z. Reguly

Gihan R. Mudalige

چکیده

Efficiently exploiting GPUs is increasingly essential in scientific computing, as many current and upcoming supercomputers are built using them. To facilitate this, there are a number of programming approaches, such as CUDA, OpenACC and OpenMP 4, supporting different programming languages (mainly C/C++ and Fortran). There are also several compiler suites (clang, nvcc, PGI, XL) each supporting different combinations of languages. In this study, we take a detailed look at some of the currently available options, and carry out a comprehensive analysis and comparison using computational loops and applications from the domain of unstructured mesh computations. Beyond runtimes and performance metrics (GB/s), we explore factors that influence performance such as register counts, occupancy, usage of different memory types, instruction counts, and algorithmic differences. Results of this work show how clang’s CUDA compiler frequently outperform NVIDIA’s nvcc, performance issues with directive-based approaches on complex kernels, and OpenMP 4 support maturing in clang and XL; currently around 10% slower than CUDA.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Investigating the Effects of Hardware Parameters on Power Consumptions in SPMV Algorithms on Graphics Processing Units (GPUs)

Although Sparse matrix-vector multiplication (SPMVs) algorithms are simple, they include important parts of Linear Algebra algorithms in Mathematics and Physics areas. As these algorithms can be run in parallel, Graphics Processing Units (GPUs) has been considered as one of the best candidates to run these algorithms. In the recent years, power consumption has been considered as one of the metr...

متن کامل

Semi-automatic Parallelisation of Unstructured Mesh Codes Using Domain Decomposition

In this paper we discuss enhancements to a suite of semi-automatic parallelisation tools to enable unstructured mesh (irregular) computational mechanics (CM) codes to be rapidly parallelised using SPMD domain decomposition techniques. This work draws upon the dependence analysis and code generation techniques that were originally developed for structured mesh (regular) FORTRAN codes and have be...

متن کامل

Integrated flow and stress using an unstructured mesh on distributed memory parallel systems

Domain decompositionmethods can be successfully applied to the parallelisation of existing unstructured mesh computational mechanics codes. Such codes tend to be large and so a structured approach to their parallelisation is required. Algorithmic modification of order dependant iterative solvers is inevitable, but shown to be of little consequence. A well balanced mesh partition may be demonstr...

متن کامل

Modelling Continuum Mechanics Phenomena using Three Dimensional Unstructured Meshes on Massively Parallel Processors

Unstructured mesh codes for modelling continuum physics phenomena have evolved to provide the facility to model complex interacting systems. Such codes have the potential to provide a high performance on parallel platforms for a small investment in programming. Single Program Multi Data (SPMD) domain decomposition techniques have been demonstrated to provide the required parameters of high para...

متن کامل

Improving Locality of Unstructured Mesh Algorithms on GPUs

To most efficiently utilize modern parallel architectures, the memory access patterns of algorithms must make heavy use of the cache architecture: successively accessed data must be close in memory (spatial locality) and one piece of data must be reused as many times as possible (temporal locality). In this work we analyse the performance of unstructured mesh algorithms on GPUs, specifically th...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2017

Comparison of Parallelisation Approaches, Languages, and Compilers for Unstructured Mesh Algorithms on GPUs

نویسندگان

چکیده

منابع مشابه

Investigating the Effects of Hardware Parameters on Power Consumptions in SPMV Algorithms on Graphics Processing Units (GPUs)

Semi-automatic Parallelisation of Unstructured Mesh Codes Using Domain Decomposition

Integrated flow and stress using an unstructured mesh on distributed memory parallel systems

Modelling Continuum Mechanics Phenomena using Three Dimensional Unstructured Meshes on Massively Parallel Processors

Improving Locality of Unstructured Mesh Algorithms on GPUs

عنوان ژورنال:

اشتراک گذاری